1. Field of the Invention
The present invention relates to a system for retrieving documents stored in a directory structure (hierarchical structure) created on the Internet, and more particularly to a system that performs retrieval across plural directory structures created for different languages.
2. Description of the Related Art
With an upsurge in Internet users, use of the Internet on business is expanding. To facilitate access to high volumes of documents accumulated on WWW servers, directory service is provided which defines a directory structure and stores documents in appropriate directories. According to this service, when a user follows sequentially subdirectories close to his interest from the top directory, a desired document is reached. However, it is impossible for the user to always follow optimum subdirectories, and in most cases, retrieval technologies such as full-text retrieval are also used to increase the chance to reach a desired document.
Numerous multilingual information retrieval methods have heretofore been proposed to perform retrieval across different languages. For example, a method of achieving multilingual information retrieval by applying to a set (parallel corpus) of translation text pairs a method referred to as latent semantic indexing described in detail in “Indexing by latent semantic analysis” written by Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W. and Harshman, R. A., Journal of the Society for Information Science, 41(6), 391–407 is proposed in “Automatic cross-linguistic information retrieval using Latent Semantic Indexing” written by Dumais, S. T., Landauer, T. K. and Littman, M. L., In proceedings of SIGIR'96 -Workshop on Cross-Linguistic Information Retrieval, pp. 16–23, August 1996. Also, a method proposed in “Query translation using evolutionary programming for multilingual information retrieval” written by Mark W. Davis and Ted E. Dunning, In Proceedings of the Fourth Annual Conference on Evolutionary Programming, March 1995 is a typical example of multilingual information retrieval technology. Further, as described in “The mathematics of statistical Machine Translation: Parameter estimation” written by Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, Computational Linguistics, 32:263–311, 1993, research has been actively done on methods by which machine translation is achieved by using parallel corpora and a retrieval request statement written in a first language is translated to a second language by the machine translation so that documents written in the second language are retrieved.
However, in the present situation, it is difficult to say that these multilingual information retrieval methods provide sufficient retrieval precision for actual business systems. The main factor in reduction in retrieval precision of multilingual information retrieval is the problem of meaning ambiguities of words or phrases. Generally, many translation candidates exist in translation of a word (phrase) of a first language to a word (phrase) of a second language. For example, the word of the English “base” has various field-dependent translation candidates such as “a supply center for a large force of military personnel” as a military term, “any one of the four corners of an infield” as a baseball term, “a main body for supportive activities” as a political term, “digit” as a mathematical term, “alkali” as a chemistry term, “a morpheme or morphemes regarded as a form to which affixes or other bases may be added” as a linguistic term, and “the main element of a mixture” as a building term. Since these translation candidates are, in most cases, dependent on fields, it is said that, if a retrieval target is limited to a document set of a specific field in multilingual information retrieval, a high precision would be obtained.
In the directory service, in most cases, after the service is started in a specific country and language, a directory structure used therein is transferred to other countries and languages without modification so that the same directory service is offered. However, directory services performed in different countries are independent of each other, so that only documents within a single directory structure can be retrieved and documents within directory structures of other countries and languages cannot be obtained as retrieval results. Particularly in business-oriented directory services such as Internet sales and auction sites, it is important that documents of other countries and languages can be properly retrieved. In the present situation, it can be said that many potential business chances are lost.